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Abstract — In this paper we describe the state-of-the art status 
of multi-frequency detection techniques for compact sources in 
microwave astronomy. From the simplest cases where the spectral 
behaviour is well-known (i.e. thermal SZ clusters) to the more 
complex cases where there is little a priori information (i.e. 
polarized radio sources) we will review the main advances and 
the most recent results in the detection problem. 

Index Terms — 



I. Introduction 

Extragalactic foregrounds play a crucial role in microwave 
astronomy, not only by their effect as contaminants of the 
Cosmic Microwave Background (CMB) but also by their own 
right as cosmological probes. Galaxies and galaxy clusters, if 
not properly identified and taken into account, can seriously af- 
fect the measurement of the CMB anisotropies angular power 
spectrum in temperature ID-E and polarization B, [|5], CMB 
non-Gaussianity tests l6l- |[T2l and even the performance of 
component separation methods used for the study of Galactic 
foreground^ lfT3lL lfT4l . On the other hand, galaxy and galaxy 
cluster surveys in the sub-mm regime of the electromagnetic 
spectrum are powerful tools for cosmology [IT31— [|T7l . This 
is the motivation of the considerable number of works on 
extragalactic foreground detection that have appeared in the 
literature on recent years. 

As opposed to Galactic foregrounds, that are typically ex- 
tended as diffuse clouds over large areas of the sky, individual 
extragalactic objects appear as compact blobs that subtend 
very small angular scales. For this reason, both galaxies and 
galaxy clusters are often referred to as compact sources and 
their detection/separation is typically treated as a problem 
apart from the one posed by the separation of Galactic diffuse 
components. 

It is precisely the compactness of extragalactic sources that 
makes it possible to detect them against the fluctuations of 
the diffuse components (CMB included) in single-frequency 
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! On the other hand, the opposite is also true: foregrounds can affect the 
performance of compact source detection algorithms. In general, compact 
source detection algorithms find it easier to deal with diffuse foregrounds than 
diffuse component separation techniques to deal with compact sources, so the 
typical CMB analysis pipeline includes the detection of compact sources as 
a previous step to diffuse component separation. 



(channel) settings. Most of the detection methods that have 
been proposed in the literature make use of this scale diversity. 
The well known SExtractor package |0jD, for example, is 
particularly good at estimating and then subtracting the back- 
ground at coarse scales and then detecting compact sources 
by looking for small sets of connected pixels above a given 
threshold. For the same reason, techniques based on scale- 
selecting devices such as band-pass filters llT9ll-lf24l and 
wavelets l25lh l26l have proven to be very useful. More 
sophisticated detection Bayesian algorithms [27l- ll3Tl make 
also use at some point of the scale diversity of compact sources 
versus diffuse foregrounds. A recent review on methods for 
the detection of compact sources in microwave images can be 
found in l32l . 

The majority of current CMB experiments, such as the 
Wilkinson Microwave Anisotropy Probe (WMAP) [33] and 
Planck l34lh can observe the sky in several frequency bands 
simultaneously. Some of them are also capable of measuring 
not only the intensity of the microwave radiation, but also its 
polarization. This means that extragalactic compact sources 
can be observed in several different images or channels. Multi- 
channel! information can be used to improve the chances of 
detecting compact sources. In this paper we will review the 
state of the art multi-channel detection techniques for compact 
sources in microwave astronomy. 

Multi-channel (multi-frequency) astronomy is almost as 
old as the telescope itself: we have been doing it since 
the first colour filters became available. During the XX th 
and XXI st centuries astronomy has expanded its range of 
operation into the radio, microwave, infrared, ultraviolet, X- 
ray and gamma regimes of the electromagnetic spectrum: a 
tremendous amount of information we are only beginning to 
piece together. The most basic multi-channel consideration 
we can do is: given that we have observed some object in 
the optical, let us see how does it look like in, for example, 
the infra-red. Catalogue cross-correlation and band-merging, 
follow-ups and the construction of spectral energy distribution 
(SED) curves are fundamental parts of this multi-channel quest 
for knowledge. But we are not going to review this. 

A follow-up -looking at a certain position of the sky where a 

2 In this work we wil use the term multi-channel detection instead of the 
more common multi-frequency term because it can accommodate both the 
classic problem of detecting a source using different frequency maps and the 
detection of a source in polarization using images of the Stokes' parameters 
/, Q, {/. As we will see, both problems are formally equivalent and therefore 
we prefer the more general terminology. 
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particular source has been already detected in a different region 
of the electromagnetic spectrum- can only work if we have 
a clear detection of the source in the first place. Sometimes, 
however, the sources are just too faint to be detected with 
enough significance in any of the available channels. But 
maybe if we could join together all the channels we would be 
able to detect the source. Even in the cases where the source 
is clearly detected in one channel, it may be too weak to be 
observed in other channels. Once again, one can ask if there 
is any possibility to use other information apart from just the 
source position from the 'good' channels in order to enhance 
the source in the 'bad' ones. Or we can more generally ask if 
there is a better way to combine all the channels to get better 
SEDs than just going channel by channel separately. This is 
the aim of our review. 

The key for multi-channel source separation is spectral 
diversity: we hope the signal of interest to scale from one 
image to other in a different way than the other components 
(background). Assuming a linear mixture model in which TV 
different, unknown sources are added with a set of channel- 
dependent weights to produce M observed channels (M > 
N), and making certain assumptions about the statistical prop- 
erties of the sources -for example statistical independence, 
or non-Gaussianity, etc- the Blind Source Separation (BSS) 
problem is solvable by means of a number of statistical 
signal processing techniques such as the Maximum Entropy 
Method (521, Independent Component Analysis l36l . Corre- 
lated Component Analysis (221, (38l or wavelet-based Internal 
Template Fitting (39l , just to mention a few of them. For a 
more detailed review of BSS methods applied to microwave 
experiments, see |[T4l . 

Unfortunately for our purposes the above mentioned BSS 
techniques are not well suited for the detection of extragalactic 
compact sources, except for the particular case of galaxy 
clusters observed through the thermal Sunyaev-Zel'dovich 
(tSZ) effect. Individual galaxies leave their imprint on the 
microwave sky through an enormous variety of astrophys- 
ical mechanisms -from radio active lobes to dust thermal 
emission- so that, strictly speaking, each individual galaxy 
has its own unique spectral behaviour. In the linear mixing 
model this translates into N ^> M and the BSS problem is 
sorely under determined. New methods, specifically tailored 
for compact sources, become necessary. Even in the case of 
tSZ clusters, where all the sources share the same spectral 
behaviour, it may be advisable to apply other techniques 
different from the above mentioned BSS methods. The tSZ 
is sub-dominant at all frequencies, making it very difficult for 
BSS techniques to detect but the brightest clusters in the sky. 
The multi-channel detection methods we are going to describe 
in this review make use, up to different extent, of both scale 
and spectral diversities in order to optimize the detectability 
of compact sources. 

In this paper we will review the most widely used meth- 
ods for detecting extragalactic compact sources using multi- 
channel data. We will first formalize the problem in section |TTJ 
Then we will review the different approaches currently used 
in microwave astronomy. We choose the order in which we in- 
troduce the methods on the basis of the amount of information 



that is used to achieve detection: we will start in section [III 
with traditional stacking and band-merging techniques that 
make a minimal use of multi-channel information and then 
proceed in section |IV] to a linear filtering scheme that takes 
into account the correlation of the background among different 
channels. In section [V] we will discuss how to filter the data 
when both the correlation of the background among channels 
and the spectral behaviour of the sources are known. We 
will see how the second condition can be relaxed. Then we 
will discuss the more general theory of Bayesian detection in 
section |Vll Finally, we will devote section Ivnl to the particular 
case of polarization data. 

II. Compact sources in multi-channel 

OBSERVATIONS 

Let us consider a set of N two-dimensional images (chan- 
nels) in which there is an unknown number of compact 
sources embedded in a mixture of instrumental noise and other 
astrophysical components. Without loss of generality, let us 
consider the case of a single source with a certain typical 
angular scale R and located at the origin of the coordinates. 
Our data model is 

Dj (x) = Sj [x] R) + rij (x) , (1) 

where the subscript j = 1 , . . . , N denotes the index of the 
channel: it may refer to a given frequency (i.e. 30 GHz, 44 
GHz, etc), to a polarization channel (i.e. the Stokes' param- 
eters I,Q,U) or any other image indexing we can consider. 
The scale R of the sources can vary from one object to other 
(as it happens for galaxy clusters) or be the same for all the 
objects belonging to a given class (i.e. radio sources observed 
by low angular resolution experiments such as WMAP). It 
is common to factorize the source term as the product of a 
channel-dependent amplitude or intensity and a spatial profile 

sj (x) = Aj x Tj (x; R) . (2) 

The spatial profile, in turn, includes the possible effects of any 
channel-dependent point spread function (beam): 

Tj {x ; R) = bj (x) <8> r° {x ; R) . (3) 

The above formula can be expressed more easily in Fourier 
space: 

T J (k ] R)=b J (k)xT^ (4) 

where for simplicity we have kept the same symbol for the 
functions in real and in Fourier space (the argument x or k 
indicates clearly enough in which space we are). We choose 
the normalization of the profile so that 

T°(o ; i?) = 1. (5) 

Finally, for simplicity we will consider symmetric beams and 
profiles, Tj (x;R) = Tj (\x\; R) = Tj(x;R). This assumption 
is not strictly necessary and can be relaxed on demand, but it 
greatly simplifies calculations and is quite reasonable in most 
applications. 

The term rij (x) in equation (Q]) is the generalized noise in 
the j th channel, containing not only instrumental noise, but 
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also CMB and all the other astrophysical components apart 
from the compact sources. Let us suppose that the noise has 
zero mean and that it can be characterized by its cross-power 
spectrum: 

( nj (fc) n: (fc') ) = P jt (k) S 2 (k - k') . (6) 

In other words, we work under the assumption that the 
properties of the noise can be sufficiently described by its 
second-order statistics. Please note that the homogeneity of the 
background is a necessary condition for the power spectrum 
to be a full second order description of the background; this 
condition is not globally met in real astronomical images 
(where the Galactic foregrounds, for example, are strongly non 
homogeneous), but we can always take patches small enough 
to satisfy local homogeneity (at least on a first approximation). 

A. Notation 

In this paper we will use the notation x to indicate coordi- 
nate vectors (both in real and Fourier spaces) and the notation 
x to indicate vectors whose components are labelled with the 
indexes of the different channels. According to this notation, 
we can write equations 0]-[6]) as 

D (x) s (x; R)+n (x) 

s (x) = At (x; R) 

r(k;R) = b(k)r°(k;R) 

(n* (fc) n = P(k)S 2 (k - A?) . (7) 

It will be useful in some cases to expand the vector of 
intensities as 

A = I J , (8) 

where I* can be interpreted as a channel-independent intensity 
and f is a vector containing the spectral behaviour of the 
sources across the different channels. For example, when 
we consider the tSZ I* can be associated with the cluster 
Compton-y parameter and f takes the well-known form, in 
thermodynamic temperature units, 

e° + 1 

/,ocz>-^--4 (9) 
e v — 1 

where v = hv/kT, v is the frequency of observation, k is the 
Boltzmann constant and T is the temperature of the CMB. 
Other example is provided by radio sources whose spectral 
behaviour can be approximated by a power law 

fu oc (j^j , (10) 

where v§ is some frequency of reference and a is known as 
the spectral index. In this case, /* can be interpreted as the 
source flux density at the reference frequency vq. Please note 
that analytic spectral laws such as © or (TTTTb are not always 
available, or even convenient. 

A list of symbols used in this paper can be found in table U 
that appears in the Appendix. 



III. Basic multi-channel detection 

Probably, the simplest imaginable approach to the multi- 
channel problem consists in trying somehow to transform 
it to a much simpler single-channel problem. The theory 
and applications of single-channel source detection are well 
studied in the literature; we will assume in this work that the 
reader is already familiar with the topic. A short, recent review 
can be found in [|32l and references therein. 

Let us start with the simplest -and most frequent- case 
of multi-channel observation we can conceive. Imagine we 
have just two independent observations D\ and D2 of the 
same source, taken at the same frequency and with the same 
instrumental characteristics (beam size, noise level, etc). A 
perfect example is any pair of identical radiometers in an 
experiment such as Planck. Let us also assume that the two 
observations are simultaneous, or at least that the source 
under study has not experienced variability in the time passed 
between observations. Then, it is well known that if the noises 
of the two observations are Gaussian with rms g\ = a 2 = cr 
and independently distributed, the linear combination D = 
(Di has a lower rms noise a = a /y/2. If the different 

channels have different noise levels but the noises are still 
uncorrelated, another well-known result is that the noise of 
the linear combination of the channels can be minimized by 
inverse noise variance weighting of the individual channels. 
Once the multi-channel data has been combined (or projected) 
into a single channel (or plane) it is straightforward to apply 
threshold detection, or filtering plus thresholding, or any other 
of the techniques described in ll32l . With the appropriate mod- 
ifications, the weighting internal linear combination scheme 
can be extended to correlated noise [40 1. The problem with 
this approach is that if the source has not the same intensity 
in all the channels (or if its spectral behaviour is not perfectly 
known) it is impossible to relate the intensity observed in the 
combined image with the true intensity of the source. 

Other merging schemes can be advisable in some other 
circumstances. Imagine that some of the components of the 
noise in (Q} are constant across all the channels: the best exam- 
ple is the CMB itself in images expressed in thermodynamic 
temperature units. Then an obvious way to reduce the noise 
is to subtract pairs of channels. As an example, [41] and [|42l 
used combinations of the WMAP W and V bands in order 
to produce a CMB -free map to better detect the elusive radio 
galaxies in the WMAP 5 -year data. The problem with this 
approach is that if a source has by chance a spectrum flat 
enough, it may be also cancelled by the subtraction. 

IV. Matrix Multi-Filters 

A recurrent problem of channel linear combination is that 
accurate photometry -that is, the determination of J*- is not 
possible unless the exact values of all the components of the 
spectral behaviour f are known. But the spectral behaviour 
is not known in most cases. General laws such as (ITOb are 
not reliable when the range of frequencies involved is wide 
enough, and even if they are there is the additional problem 
that different sources in a given patch of the sky will have 
different spectral indexes. The classic solution in these cases, 
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Fig. 1. Comparison between single-channel matched filtering and multi-channel MTXF filtering for a four-channel simulation of a patch of the sky as 
observed by the Planck 30, 44, 70 and 100 GHz detectors. The top row of images show the simulated patches. The second row shows just the point sources 
present in the simulations. The third row shows the patches after being filtered with single-channel matched filters. The fourth row shows the patches filtered 
with the corresponding MTXF. The x and y-axes are in pixel units (pixel size = 3.435 arcmin) 



if one wants to persist using linear combination nevertheless, is 
to go back to the individual channels and extract photometric 
measurements on them. 

In order to overcome these problems [44] proposed 

a method that combines the signal-to-noise boosting capacity 
of linear filters with a particular merging scheme that makes 
totally irrelevant the spectral behaviour f. The method is 
called matrix multi-filters (MTXF) and its foundations can be 
summarized as follows: 

Let (x) be a set (matrix) of N x TV linear filters, and 
let us define the set of quantities 

3 J 

The quantity above is the sum of a set of linear filterings 
of the individual channels. We intend to use the combined 
filtered images Wi (x) as estimators of the source amplitudes 



Ai. For that purpose, we require that the filters \E^j satisfy the 
conditions: 

• The combined filtered image at the position of the source 
Wi (o^j is an unbiased estimator of Ai. 

• The combined filtered image at the position of the source 
Wi (^j is an efficient estimator of Ai, that is, the variance 
of the estimator is minimum. 

Looking at the structure of (fTTb it is easy to verify that a 
sufficient condition in order to guarantee the first condition 
above is 

J dy ^ij (x - y) Tj (y) = % (x) , (12) 

that is, the filter functions are orthonormal to the source profile 
functions. This guarantees statistical unbiasedness indepen- 
dently of whatever values f may take. The solution to the 
problem given the two conditions above and the constraints 
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(fT2l) is, in Fourier space 

** = AP _1 , (13) 
where * denotes complex conjugation and 

A = H 1 

Him = J dkn (k) P[~J (k) T* m (k) . (14) 

In the case where P is diagonal (no noise correlation among 
channels), it can be shown that the matrix multi-filters default 
to a simple matched filter applied individually to each channel. 

The MTXF do not project TV channels into a single effective 
plane where to detect the sources. They project instead N 
channels into N new planes where the sources can be detected 
and their N amplitudes A{ (i = 1, . . . , N) can be estimated 
separately. The difference with the single-channel approach 
is that each one of the N new planes Wi is constructed 
by combination of the original N channels in such a way 
the noise is cancelled more effectively: the MTXF use the 
multi-channel cross-correlation of noise but do not use at all 
any information about the spectral behaviour of the sources. 
It can be described as a 'semi multi-channel approach', in 
the sense that it uses only half of the available information. 
When the noise cross-correlation among channels is zero, the 
method defaults to standard single-channel matched filtering. 
But when the cross-correlation is not null, the MTXF give 
better signal-to-noise boosts than the standard single-channel 
matched filter. Figure [T] shows the comparison between the 
single-channel matched filter and the MTXF for a simulation 
of the Planck 30, 44, 70 and 100 GHz channels. Note that 
for the 44 and 70 GHz channels the output matrix filtered 
maps look far cleaner than their matched filtered equivalents. 
For the 30 GHz channel, the distinction is not so clear, but 
some improvement can be appreciated nevertheless. Finally, 
for the 100 GHz channel both filtered images look practically 
identical. The gain factors obtained for these images with the 
MTXF are [2.9, 3.8, 3.5, 2.8] for the [30, 44, 70, 100] GHz 
channels. The signal-to-noise gain ratio between the MTXF 
and the matched filter is GMTXF/GMF = [1.38, 1.52, 1.49, 
1.00] for the [30, 44, 70, 100] GHz channels. Note that in one 
of the channels (100 GHz) the signal-to-noise gain is equal to 
one; although this is not always the case, the MTXF tend to 
default to the standard matched filter for some of the channels 
when the multi-channel mixing does not add information in a 
constructive way: this is often (but not always) the case for 
the Planck 100 GHz when combined with the lower frequency 
channels, that have worse angular resolution and higher noise 
levels. This cannot be taken as a general rule, because the 
conditions change from one region of the sky to another. 
Only in the worst case the 100 GHz MTXF filtered image 
is as good as the single-channel MF filtered image. Regarding 
the number of true and spurious detections produced by both 
methods Figure [2] shows the receiver operating characteristic 
(ROC) curves for the MTXF and the MF in the four considered 
channels; a clear improvement can be appreciated at 30, 44 
and 70 GHz, whereas both methods work similarly for thr 
100 GHz case. This serves as an indication of how MTXF 
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Fig. 2. ROC curves for the filtering schemes and the four channels 
considered. Solid line: MF. Dot-dashed line: matrix filters. Axis labels: TPR 
stands for True Positives Ratio (associated to the completeness of a catalogue), 
FPR stands for False Positives Ratio (associated to the purity of the catalogue, 
a low FPR indicates a high purity). 

can produce results better or at least equal to single-channel 
matched filters without making any use of the sources spectral 
diversity. 

V. Matched Multi-Filter 

Multi-channel detection techniques reach the summit of 
their power when they fully exploit the spectral diversity of 
compact sources with respect to the background, he fully 
multi-channel compact source detection techniques that are 
being applied in the context of CMB astronomy lie in two 
main groups: linear filtering methods based on the so-called 
matched multi-filters plus some thresholding detection criteria 
and Bayesian methods, to be discussed in the next section. 

A. Standard matched multi-filters 

Let us assume that the spectral behaviour / of the sources 
is known a priori. An example is the well-known thermal 
Sunyaev-Zel'dovich effect which, ignoring relativistic effects, 
has a spectral behaviour given by ©. This opens interesting 
possibilities. For example, if f is known, it is straightforward 
to stack the different channels with an optimal weighting 
designed to minimize the effects of the background while 
keeping intact the intensity of the sources R31 : 

D (x) = Wi Di (x) = w*D (x) , (15) 

i 

such that if there is a source of intensity /* located at the 
origin, then 

(£)($))= I*. (16) 

The solution to this kind of internal linear combination of 
channels is given by the generalized eigenvalue problem B31 

(G-AM)w = 0, (17) 
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where the elements of the matrices G and M are given by 



Gij 



(rii (x)nj(x)). 



(18) 
(19) 



It is evident that M is a measure of the cross-correlation of 
the noise among the different channels. The combination dT3T) 
using the weights obtained through (1771) gives the optimal 
internal linear combination (ILC) map for the detection of 
sources with the spectral behaviour f . The ILC map can be 
further processed using a single-channel matched filter as 
suggested by B31 . In an independent work, [40] combined 
simulated multiwavelength maps in order to increase the 
average signal-to-noise ratio of galaxies assuming that their 
spectral behaviour could be modelled by expressions such as 
(ITQb with a known spectral index. 

In the previous ILC method, however, the separation be- 
tween linear combination and filtering seems somewhat ar- 
tificial. The process of filtering and projecting into a single 
effective plane can be achieved in a single step by means of 
the so-called matched multi-filters [|45l-ll48l. Let us define the 
effective filtered plane as a sum of optimally filtered images 



(20) 



where the ipi are N linear filters depending on a certain scale 
parameter S. The meaning of this scale parameter will become 
evident shortly. If we impose to the effective filtered plane D 
the usual unbiasedness and efficiency conditions, that is 

• At the position of the sources, D is an unbiased estimator 
of /*. 

• The variance of D is minimum. 

The solution, as proven in R31 . is given in Fourier space by 

i) (jfc; S) = a(S) P" 1 (fe) F (jfe; S) , (21) 
a-\S) = J dk F £ (jfe; P 1 (Jtj F (jfe; , (22) 

where the vector F has components Fi (k\S^j = (^k;S^j. 
We show explicitly the dependence on S for a reason that will 
be clear immediately. The filters defined by (|2H take the name 
of matched multi-filters (MMF). 

Just to show the power of the MMF, in Figures [3] and [4] 
we show the unfiltered channels of a typical simulated SZ 
observations and the corresponding filtered plane. Figure 
shows a small part of a realistic sky simulation obtained with 
the Planck Sky Model (PSM) package | 49] for the Planck HFI 
frequencies of 100 143, 217, 353, 545 and 857 GHz. There 
is a bright galaxy cluster in the center of the images, but it is 
very hard to spot it visually. On the other hand, the presence 
of the cluster is evident in the MMF-filtered image shown in 
Figure @] 

B. Objects with different size 

Until now, we have not made reference to the scale R of 
the sources that appeared in equations (HI)-©. In sections [TTTI 
and [iVl we have implicitly assumed that all the sources shared 
the same basic profile tq. This is pretty well the case of 
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Fig. 3. Simulated patches of the sky at 100, 143, 217, 353, 545 and 857 
GHz. A bright galaxy cluster is in the centre of the patches, but it is almost 
invisible among the CMB and Galaxy fluctuations. 
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Fig. 4. The same area of the simulated sky as in Figure [3] after filtering 
with MMF. Now the galaxy cluster is clearly visible. 



galaxies in low resolution experiments such as WMAP and 
Planck: the angular scale of these objects is typically far 
smaller than the instrument psf and therefore all of them can 
be safely considered as point sources, with observed profiles 
that are basically equal to the observing beam. In that case, 
we can effectively forget about the scale R and assume that 
is something intrinsic to the profile To that does not require to 
be made explicit. 

But this cannot be the case for galaxy clusters: many of 
them are resolved objects even at the relatively low angular 
resolutions of Planck l50ll . Detection algorithms must there- 
fore provide not only a list of positions and intensities, but 
also of sizes of clusters. Moreover, they should give accurate 
photometry (values of /*) for all the range of possible object 
sizes. MMF can be adapted to do precisely this in a very 
simple manner. 

Let us assume that all the compact sources in a given multi- 
channel image have the same spatial profile except for a scale 
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parameter R. For example we may consider the universal 
galaxy cluster pressure profile given by [1571 with a different 
contrast radius R$oo for each cluster in our sample. If the j th 
cluster has a true value of its scale parameter equal to Rj, 
then is it straightforward to prove that the filters (BTl satisfy 
the conditions of unbiasedness and maximum efficiency if and 
only if their scale is S = Rj. Moreover, as a part of the 
demonstration it can be shown that the signal-to-noise boosting 
given by the filter for that particular cluster is maximum 
also if and only if S = Rj. This suggests a very simple 
detection/estimation algorithm: 

1) Filter the multi-channel image with a set of filters (ITil 
with different scales Si. 

2) Select as cluster candidates the positions of the maxima 
of the filtered images above a certain threshold. 

3) For each cluster candidate j, obtain the curve showing 
the signal-to-noise boosting versus the scale Si. Find the 
location S max (j) of that curve. 

4) Make Rj = S max (j). 

C. Unbiased matched multi-filters 

Galaxy clusters interact with CMB photons not only through 
the thermal SZ effect, but also through the so-called kinematic 
SZ effect due to the proper motion of the cluster. This is a 
perfect example of a case in which the same object produces 
two signals with identical spatial distribution but different 
spectral behaviours. The mixing of the two signals can affect 
negatively the determination of each of them. Normally the 
kinematic effect is at least one order of magnitude fainter 
than the tSZ and therefore the bias introduced by it in tSZ 
measurements can be neglected. But the contrary is not true: 
the tSZ can contaminate significantly the estimation of the 
kinematic effect. In order to avoid this [|46l proposed a 
modification of MMF specifically tailored to cancel out this 
bias. 

Let us consider a case in which we wish to observe the 
kinematic SZ effect without being disturbed by tSZ. In ther- 
modynamic units the vector f = [/i, . . . , /n] of the thermal 
effect is given by equation (|9]). The spectral behaviour of the 
kinematic effect is, in the same units, flat: [1, . . . , 1]. We look 
for a set of N filters such that the estimation of 

the kinematic effect is not affected by the thermal. A sufficient 
condition for this is 

J dk = 1 

J dk F £ $ = 0, (23) 

(24) 

which can be interpreted as a kind of orthogonality with 
respect to the spectral behaviour laws of the components. The 
solution, when we add the maximum efficiency constraint, 
looks similar to the MMF: 

* = IP" 1 (-0F + or) . (25) 



In this equation the constants a, j3 and A are given by 



a = 


J 


f dk F*P 


_1 F, 


a = 


J 


f dk r*P 




7 = 


1 


f dk r*P 




A = 







Similarly, it is possible -albeit less interesting- to design 
filters that give the thermal SZ effect while cancelling the bias 
introduced by the kinematic effect. Both families of filters are 
called unbiased matched multi-filters (UMMF). 

D. Unknown spectral behaviour 

The prior knowledge of f is the fundamental key of the 
success of MMF and the reason why they are able to get 
high signal-to-noise ratio boosts with respect to the individual 
channels. However, as we mentioned before this prior knowl- 
edge is only certain when considering the SZ effect (and while 
ignoring relativistic effects). Extragalactic radio and infrared 
sources are not that simple to model. One possible solution 
is to group sources in broad families -radio flat, radio steep, 
dusty galaxies of a certain type, etc- and to define average 
spectral laws, such as ([TOi for each family. This approach 
will not be optimal for individual sources and will certainly 
introduce biases in the photometry, but at least is expected 
to improve the number of detections with respect to single- 
channel detection. 

Fortunately MMF allow us to solve the conundrum in a very 
elegant way by means of an adaptive filtering scheme similar 
to the one described in section IV-BI 

Now imagine that f describes the true (unknown) spectral 
behaviour of a source and that g = [gi], i = 1, . . . , N is a 
new vector of equal size as f , but whose elements can take 
any possible value. We can define the MMF for vector g: 

l/>g = OLg P^G 

a' 1 = J dk G'P^G 

Gi = gin, (27) 

where all the definitions are in Fourier space and we have not 
written down the explicit dependence on k for simplicity. If 
we apply the filter (1271) to a source with true spectral behaviour 
f and true intensity we will get 

I g = D(0) = ha g J dk G*P _1 F, (28) 

so I g 7^ I* unless g = f . This is no help since we do not know 
the true value I* . But it can be shown l52ll that the quantity 

SNR q = ^, (29) 

where a 1 is the variance of the image filtered with (1271 ), is 
maximum when g = f . SNR g is obviously the signal-to-noise 
ratio of the source after filtering. This allows us to reproduce 
the same kind of algorithm that was used in section IV-BI to 
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detect the sources and estimate simultaneously their position, 
intensity and spectral behaviour (instead of their scale). The 
only difference here is that while in section IV-BI there was 
only one parameter per cluster to be determined (the scale), 
here we have N — 1 parameters per cluster (components of 
f ) to determine. The method has been recently applied to the 
two highest frequency channels of WMAP (53), leading to a 
new catalogue of 157 objects detected at the 5a level at 94 
GHz, which is a substantial improvement over the WMAP 
Five-Band Point Source Catalogue. 

VI. Bayesian multichannel detection and 

ESTIMATION 

All the multi-channel detection methods above have a 
common two-step methodological approach: in a first moment 
the data are somehow pre-processed and then objects are 
detected by means of some criterion -typically thresholding- 
that hopefully separates them from the background noise. 
For the pre-processing step, one chooses beforehand a given 
tool (e.g. linear combinations of channels, or a given kind of 
filter/wavelet) and adjusts a small number of parameters (e.g. 
the weights of the linear combination, the scale of the wavelet, 
etc) according to some optimality criterion (e.g. unbiasedness, 
maximum efficiency). For the detection step, an arbitrary 
threshold is chosen with the hope to reach a compromise 
between minimizing the number of spurious detections and 
maximizing the number of true detections. A third step for 
the estimation of a number of parameters of the sources 
(e.g. intensity, size...) can be attempted after detection or, 
in some cases, simultaneously to itj (as it occurred with 
MMF). Errors to the estimated parameters are calculated by 
means of some prescription such as Fisher analysis. All these 
steps are somewhat arbitrary and are focused only in the 
statistical properties of the background (and the deterministic 
spectral behaviour of the sources, in the case of multi-channel 
detection). This approach leaves out all probabilistic (and 
potentially useful) information about how many sources are 
expected to be found above a given flux limit, how are they 
distributed as a function of intensity, how many classes of 
sources are there and in which proportion are they present in 
the data, etc. 

The Bayesian system of inference is the only one that 
provides a consistent extension of deductive logic to a broader 
class of degrees-of-belief by mapping them into the real 
interval [0,1] |54|. Bayesian inference provides a logically 
consistent way of tackling the detection problem as a part 
of the decision theory, while incorporating a probabilistic 
description of the sources as a priori information in a natural 
way. The Bayesian framework also provides a sensible detec- 
tion criterion through the Bayesian posterior odds ratio fl54ll . 
l55l . In estimation, Bayesian methods give a full description of 
the a posteriori distribution of the parameters given the current 
data, thus allowing us to obtain expectation values, confidence 
level contours and any other statistics of interest. 

3 Even if detection and estimation can be done at the same time under 
certain circumstances, it must be noted that in statistics they are fundamentally 
different concepts. 



A detailed description of multi-channel Bayesian detection 
of compact sources is given in l55l . Here we will just 
summarize the main theoretical aspects of the problem. Let 
us start with B ayes' theorem 



Pr(Q\D,H) = 



Pr (D|0,iT) Pr(G\H) 



(30) 



Pr(D\H) 

where D is the vector of the observations as in eq. and is 
the vector contains all the relevant parameters (positions, 
intensities, sizes, etc) to the detection problem and H is 
the underlining hypothesis. In the usual Bayesian terminol- 
ogy, Pr(&\D,H) is the posterior probability distribution 
of the parameters, Pr(D\®,H) = £(0) is the Likelihood, 
Pr (Q\H) = Pr(0) is the prior and Pr (D\H) = Z is the 
Bayesian evidence. The most simple detection scenario can be 
described as a decision between two incompatible hypothesis 
Hi (there is a source) and Ho (there is not a source). In this 
case, it can be shown l54li . l55l that the decision criterion that 
minimizes the expected loss (that is, that optimizes the balance 
between the number of true and false detections) is given by 
the following condition on the Bayesian odds ratio 



In 



PrjHi\D) 
Pr(H \r>)\ 



In 



Z 1 Pr(#i 



[Z Pr(H )\ 



#0 



(31) 



where £ is a threshold that is fixed for any given loss function. 
The choice of the loss function depends on the particular 
setting of the experiment; a common custom in microwave 
astronomy is to give identical weight to a spurious detection 
than to a missing true detection. 

A. Likelihood 

The form of the likelihood is determined by the statistical 
properties of the generalised noise (background sky emission 
plus instrumental noise) in each frequency channel. If the gen- 
eralised noise is statistically homogeneous it is more conve- 
nient to work in Fourier space, since there are no correlations 
between the Fourier modes of the generalised noise. Please 
note that although Galactic emission is not homogeneous, one 
can always operate locally in patches small enough to make 
the homogeneity assumption valid (as a first approximation). It 
is also common to assume that both the background emission 
and instrumental noise are Gaussian random fields. This is 
a very accurate assumption for instrumental noise and the 
primordial CMB, but more questionable for Galactic emission. 
Under the previous assumptions, the likelihood ratio between 
the hypotheses Hi and Hq can be written as 



In 



W0) 



^D'^P-^s^;©) 

k 

- s*(£; 0)P-^)s(£; 0).(32) 



Note that the effect of the products X £ P _1 Y, with X and 
Y generic vectors, is to project the multi-channel data into 
one single equivalent channel (or plane), as it happened in the 
MMF and UMMF. The similarities are more profound. It is 
shown in [55 ] that, when the source term s is written as the 
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sum of N s compact sources distributed across the image, the 
likelihood ratio (l32t can be split into two parts: 

• A sum of 'auto-terms' each containing just the parameters 
corresponding to the i th source (i = 1,...,N 3 ) 

• A sum of 'cross-terms' with mixed parameters corre- 
sponding to the i th , j th sources (i, j = 1, . . . , N s ) 

The cross-term goes quickly to zero when the sources are 
well separated, but must be taken into account when source 
blending is frequent (crowded fields). It is worth noting that 
maximising the likelihood ratio (l32k in the absence of the 
cross-term (negligible source blending), with respect to the 
source intensities leads precisely to the MMF described 
in section [V] This is the multi-channel generalization of the 
well-known result that matched filtering is the solution of the 
generalized maximum likelihood test (GLRT) for Gaussian 
noise and a single source ll32l . [ 56 1 . 



B. Priors 

If the data model provides a good description of the 
observed data and the signal-to-noise ratio is high, then 
the likelihood will be very strongly peaked around the true 
parameter values and the prior will have little or no influence 
on the posterior distribution. At the faint end of the source 
population, however, priors will inevitably play an important 
role. Moreover, since for most cases in astronomy the faint 
tail overwhelmingly dominates the population, the selection 
of the priors becomes important and has to be addressed 
very carefully. Physical (informative) priors are particularly 
useful when addressing the detection problem, whereas non- 
informative priors can be more adapted to the task of parameter 
estimation once the sources have been detected [ 55 1 . For multi- 
channel detection, the list of priors to be considered must 
include 

1) Prior on the positions: For extragalactic sources, it is 
reasonable to assume a uniform prior. Clustering can be a 
problem, particularly for dusty galaxies observed at v > 300 
GHz. 

2) Prior on the number of sources: Following the same 
rationale of local uniformity, i.e no clustering, the probability 
of finding N s objects (above a given flux limit) in a sky patch 
follows a Poisson distribution 

n(N s ) = Pr(N s \X) = e~ x — , (33) 

where A is the expected number of sources per patch. 

3) Prior on intensity: A usual informative prior on the dis- 
tribution in intensity of extragalactic sources is a Generalized 
Cauchy Distribution such as in [31]. This provides a good 
model for the observed distribution of fluxes, fitting the de 
Zotti or Tucci models almost perfectly 0, 0. Moreover, 
the distribution can be properly normalised as required for 
evidence evaluation. For the case of galaxy clusters, a power 
law distribution fits well the cluster populations assuming 
a standard WMAP best-fit ACDM cosmology (53 and the 
Jenkins mass function [58]. 



4) Prior on the sizes: Point sources are best modelled by 
imposing the prior ir(R) = S(R) on the radius. For galaxy 
clusters, the fact that a significant number of them can be 
resolved must be taken into account. In (55l a truncated 
exponential law is found to fit the simulated catalogues very 
well. 

5) Prior on the spectral parameters: For galaxy clusters 
observed through the thermal Sunyaev-Zel'dovich effect, the 
spectral behaviour can be safely fixed to ©, if we ignore 
relativistic effects. For point sources, however, the fact that 
each source has a spectral behaviour that is different from 
all the others must be accommodated. Either if we use power 
laws such as (ITUb for radio sources or grey body laws for dusty 
galaxies, there is at least one spectral parameter that changes 
from one source to another. Informative priors can be derived 
from the available source counts models 0, 0, or uniform 
priors can be used when physical models are not available. 

6) Priors on the models: If there is more than a kind 
of source in the data (e.g. we have radio sources, dusty 
galaxies, Galactic compact sources and galaxy clusters in the 
same image) then appropriate priors in the probability of the 
different hypotheses Hj should be included in the Bayesian 
framework. More importantly, the Bayesian framework allows 
us to explicitly consider the probability of background fluctu- 
ations above a certain level in order to optimize the decision 
between the hypotheses Hj and the null hypothesis Hq (i.e. 
no source). 

C. Bayesian detection methods in the literature 

Solving the Bayesian equations for object detection and 
parameter estimation typically implies sampling from a very 
complex posterior distribution with variable dimensionality 
(dependent on the number of objects). Thus the main problem 
that Bayesian methods need to overcome is the computa- 
tional burden of evaluating integrals over the posterior and 
its marginals. This can be avoided in some cases by semi- 
analytical Maximum A Posteriori evaluation ||3T1L but if a 
full analysis of the posterior is required then sampling is 
unavoidable. Typical implementations include Monte-Carlo 
Markov chain (MCMC) algorithms [|27l and posterior refine- 
ments such as nested sampling (28lk l59ll . lf29ll implemented 
a simultaneous multiple minimization code based on Powell's 
direction set algorithm [60], that is generalized to the multi- 
channel case in (55). A full description of that algorithm and 
its catalogue making procedure it out of the scope of this 
review. 

VII. Polarized sources 

An interesting case of multi-channel setting is the detection 
of extragalactic sources in polarization data. Mathematically 
the problem is not different from the general model given in 
section [HI but the particularities of polarization observations 
have motivated the appearance of some specific techniques 
apart from the ones already discussed. 

CMB polarization has been described as the next observa- 
tional frontier of cosmology. In the last few years, some meth- 
ods have been specifically developed to address the important 
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problem of detecting compact sources in polarization data. In 
this section, we will give a brief review of these methods, 
which have been applied to CMB simulations as well as to 
real data ll6TTl-ll63l. 

Polarization of light is conveniently described in terms 
of the Stokes parameters I, Q, U and V, (see l64l for an 
excellent review on CMB polarization). Q and U are the 
linear polarization parameters and V indicates the circular 
polarization. Whereas Q, U and V depend on the orientation 
of the receivers, the total polarization, defined as 

P = ^Q 2 + U 2 + V 2 , (34) 

is invariant with respect to the relative orientation of the re- 
ceivers and the direction of the incoming signal, and therefore 
has a clear physical meaning. This quantity can be treated 
as the modulus of a vector. Thus, the methods presented 
in (61] have to do with the study of a set of images which 
contain signals whose individual intensities can be considered 
as components of a vector, but where the quantity of interest 
is the modulus. First, we will consider the case of linear 
polarization, V = 0, given that the CMB is not circularly 
polarized in the standard cosmological models [64 1 . However, 
since some models predict a possible circular polarization l65l . 
we will comment briefly on this case later. 

If we have a compact source embedded in the data of Q 
and U, these can be expressed in the following way 

D QiU (x) = A QjU r(x) + n QjU (x). (35) 

with Aq^jj the compact source intensity in Q and U, r(x), 
the beam profile and uq } u(x) the corresponding noise in both 
components. The P-map, P(x) = (Dq(x) + Dfj(x)) 1 / 2 , is 
characterised by a source at the centre of the image with 
amplitude A = (Aq + Afj) 1 / 2 immersed in non-additive noise 
which is correlated with the signal. 

We assume the same beam profile for the images in Q and 
U, as well as a Gaussian independently distributed noise with 
zero mean and the same variance for Q and U. Given these 
typical conditions, the distribution of P if a source is present 
is the Rice distribution 

f(P\A) = ile-O^W/,, (a^J , (36) 

where a is the noise rms deviation and Io is the modified 
Bessel function of zero order. By using the Neyman-Pearson 
lemma (32), a filter is obtained which produces the maximum 
likelihood estimator of the amplitude, A. In the case of a 
pixelized image, we can write 

Y^i i h{A yi ) cr- 

where the indexes refer to the pixels and I\ the modified 
Bessel function of the first order. This filter is called the 
Neyman-Pearson filter (NPF). Alternatively, a matched filter 
can be applied to each image and then, with the two fil- 
tered images Qmf, Umf a non-linear fusion P = (Q 2 MF + 
Ump) 1 / 2 can be made pixel by pixel. This filter is called the 
filtered fusion (FF). 



Simulations with the Planck characteristics showed that 
the FF performed better than the NPF especially for low 
polarization fluxes. The FF outperformed the NPF when the 
power of the detections with a given significance and the flux 
and position estimation were compared. 

If the circular polarization is taken into account, the NPF 
and the FF can be also calculated. Indeed, the filters can 
be computed for the modulus of a vector with any number 
of components [63]. Simulations showed that in the case of 
circular polarization, the FF also produces better results than 
the NPF. 

Taking into account the results with simulations, (62l ap- 
plied the FF to the detection of polarized sources in the WMAP 
5 -year data. They detected, with a significance level greater 
than 99.99% in at least one WMAP channel, 22 objects, 5 of 
which were doubtful, since they did not have a plausible low 
frequency counterpart. These detections were a clear advance 
with respect to the 5 polarized sources listed by the WMAP 
team when they analysed the same data. The application of a 
filter targeted specifically for polarization detection had a clear 
influence on the improvement. 

The application of these methods to Planck data can be 
expected in the future. It would also be interesting to combine 
them with a Bayesian approach, so that some previous infor- 
mation about polarization properties of the sources could be 
taken into account. 

VIII. Final remarks 

CMB image processing is a relatively young area of re- 
search. Much work remains to be done. There is an incipient 
but resolved interest among CMB cosmologists to incorporate 
the newest ideas to solving the problem of compact sources 
in microwave images. 

In this paper we have reviewed the status of the different 
algorithms and methods that attempt the detection of extra- 
galactic compact sources (galaxies and clusters of galaxies) 
using multi-channel (that is, obtained by means of more than 
one single detector) data. This should not be confounded with 
the traditional band-merging approach to catalogue making. 
We included the case of detection in polarized data because 
it is formally equivalent to the detection in multi-frequency 
experiments. The techniques we have reviewed include linear 
fusion of images, different multi-channel filtering methods 
and Bayesian object detection. Although the area of research 
is young, all of these methods have already been applied 
to astronomical data sets with great success. In next years, 
however, we expect to see new interesting developments in 
the field. 

One of the most promising ideas for future research is 
related to the notions of sparsity and l p -approximations. For 
the particular case of point like objects, the idea of sparse 
dictionaries comes naturally [|66l . However, the full appli- 
cation to multi-channel CMB compact source detection has 
not been addressed yet. Wavelet techniques, that are very 
popular in single-frequency source detection, open another 
interesting possibility that has not been explored yet. The same 
applies to other space- scale representations such as Gabor 
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and Wigner-Ville transforms. Undoubtedly, both Bayesian and 
multi-filtering techniques will see substantial improvements 
in the next few years. All these new developments will be 
really come in handy for the new generation of many channel 
cosmological surveys such as the upcoming J-PAS0. 

Acknowledgement 

The authors would like to thank the editors of this Special 
Issue for their invitation to submit a contribution. We also 
thank Prof. Gianfranco De Zotti for his useful advice. DH 
and FA acknowledge partial financial support from the Spanish 
Ministerio de Ciencia e Innovation project AYA20 10-2 1766- 
C03-01. DH also acknowledges the Spanish Ministerio de 
Education for a Jose Castillejo' mobility grant with reference 
JC20 10-0096 and the Astronomy Department at the Cavendish 
Laboratory for their hospitality during the elaboration of this 
paper. FA also wants to thank the Astronomy Department 
at the Cavendish Laboratory for their hospitality during two 
short stays in the summer of 2011. The authors would like 
to thank Mike Hobson for his very useful contributions and 
for the wonderful, although very sharp and clear, descriptions. 
P. Carvalho is supported by a Portuguese fellowship (ref: 
SFRH/BD/42366/2007) from the Fundacao para a Ciencia e 
Tecnologia (FCT). 





Appendix 


x, y 


position vector 


k 


wave vector (Fourier) 


N 


number of channels 


D 3 


observed data at the j th channel 


S 3 


signal at the j th channel 


n 3 


noise at the j th channel 


*j 


source amplitude at the j th channel 


T l 


source profile at the j th channel 


T° 


intrinsic source profile (before smearing by the beam) 


h 


beam point spread function at the j th channel 


R 


scale parameter 


P 


cross-power spectrum matrix 


f 


spectral behaviour of the source 




matrix (or vector) of filters 


SNR 


signal-to-noise ratio 





vector of parameters (position, size, amplitude...) 


I,Q,U,V 


Stokes' polarization parameters 


P 


total polarization 


Io 


modified Bessel function of zero order 



TABLE I 

List of symbols used in this review 
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